[Table of Contents] [docx version]
SpreadsheetML Reference Material - Table of Contents
A workbook may contain thousands of cells containing string (non-numeric) data. Furthermore this data is very likely to be repeated across many rows or columns. The goal of implementing a single string table that is shared across the workbook is to improve performance in opening and saving the file by only reading and writing the repetitive information once.
Consider for example a workbook summarizing information for cities within various countries. There may be a column for the name of the country, a column for the name of each city in that country, and a column containing the data for each city. In this case the country name is repetitive, being duplicated in many cells. In many cases the repetition is extensive, and a tremendous savings is realized by making use of a shared string table when saving the workbook. When displaying text in the spreadsheet, the cell table will just contain an index into the string table as the value of a cell, instead of the full string.
The shared string table contains all the necessary information for displaying the string: the text, formatting properties, and phonetic properties (for East Asian languages).
Most strings in a workbook have formatting applied at the cell level, that is, the entire string in the cell has the same formatting applied. In these cases, the formatting for the cell is stored in the styles part, and the string for the cell can be stored in the shared strings table. In this case, the strings stored in the shared strings table are very simple text elements, and the following xml illustrates the example.
[Example:
<sst xmlns=http://schemas.openxmlformats.org/spreadsheetml/2006/5/main
count="8" uniqueCount="4">
<si>
<t>United States</t>
</si>
<si>
<t>New York</t>
</si>
</sst>
end example]
In the above example we can see that the string table is just a collection of string items that consist of simple text elements. Note that any numeric data in the workbook is not shown in the shared string table.
Some strings in the workbook may have formatting applied at a level that is more granular than the cell level. For instance, specific characters within the string may be bolded, have coloring, italicizing, etc. In these cases, the formatting is stored along with the text in the string table, and is treated as a unique entry in the table. The following xml illustrates this.
[Example:
<sst xmlns=http://schemas.openxmlformats.org/spreadsheetml/2006/5/main
count="8" uniqueCount="4">
<si>
<r>
<t xml:space="preserve">United </t>
</r>
<r>
<rPr>
<sz val="11"/>
<color rgb="FFFF0000"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<si>
<t>New York</t>
</si>
</sst>
In the above example you can see that this time, the text "United States" has specific, colored, formatting applied to the text, "States." end example]